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Abstract 



A Bloom filter is a method for reducing the space (memory) required for representing a set by allowing 

a small error probability. In this paper we consider a Sliding Bloom Filter: a data structure that, given 

a stream of elements, supports membership queries of the set of the last n elements (a sliding window), 

p^ while allowing a small error probability. We formally define the data structure and its relevant parameters 

^3^ and analyze the time and memory requirements needed to achieve them. We give a low space construction 

'^'h that runs in 0(1) worst case time with high probability and provide an almost matching lower bound on 

^v^ the space that shows that our construction has the best possible space consumption up to an additive 

r\l lower order term. 

^ 1 Introduction 

^ Given a stream of elements, we consider the task of determining whether an element has appeared in the last 

O n elements of the stream. To accomplish this task, one must maintain a representation of the last n elements 

at each step. One issue, is that the memory required to represent them might be too large and hence an 

^-H approximation is used. We formally define this approximation and completely characterize the space and 

^ time complexity needed for the task. 

Cn In 1970 Bloom BloTOj suggested an efficient data structure, known as the ^ Bloom filter\ for reducing the 

space required for representing a set S by allowing a small error probability on membership queries. The 
problem is also known as the approximate membership problem (however, we refer to any solution simply 
as a 'Bloom filter'). A solution is allowed an error probability of e for elements not in S (false positives), 
but no errors for members of S. In this paper, we consider the task of efficiently maintaining a Bloom filter 
^^ of the last n elements (called 'the sliding window') of a stream of elements. 

T-H We define a (n, m, s)-sliding Bloom filter as the task of maintaining a Bloom filter over the last n elements. 

The answer on these elements must always be 'Yes', the m elements that appear prior to them have no 
restrictions and for any other element the answers must be 'Yes' with probability at most e. In case m 
is infinite, all elements prior to the current window have no restrictions. In this case we write for short 
^ (n, £)-sliding Bloom filter. 

The problem was studied in several communities and various solutions were suggested. In this work, we 
focus on a theoretical analysis of the problem and provide a rigorous analysis of the space and time needed 
for solving the task. We construct a sliding Bloom filter with 0(1) query and update time, where the is 



worst case with high probability over the stream (see the theorems in Section 1.2 for precise definitions) 
and has near optimal space consumption. We prove a matching space lower bound that is tight with our 
construction up to an additive lower order term. Our algorithms make use of a dynamic dictionary and given 
an implementation of one, it is relatively simple to complete the algorithm's implementation. 

A simple solution to the task is to partition the window into blocks of size m and for each block maintain 
its own Bloom filter. This results in maintaining \^ + l] Bloom filters. To determine if an element appeared 
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or not we query all the Bloom filters and answer 'Yes' if any of them answered positively. There are immediate 
drawbacks of this solution, even assuming the Bloom filters are optimal in space and time: 

• Slow query time: \j^ + l] Bloom filter lookups. 

• High error probability: since an error can occur on each block, to achieve an effective error probability 

, which means that the total space used 



of s we need to set each Bloom filter to have error e' 

has to grow (relative to a simple Bloom filter) by roughly nlog ""'"'" bits (see section 1.3 1 



'n.-\-m 



• Sub-optimal space consumption for large m: the two first drawbacks are acute for small rn, but when 
TO is large, say n — m, then each block is large which results in a large portion of the memory being 
'wasted' on old elements. 

Sliding Bloom filters can be used in a wide range of applications and we discuss two settings where they 
are applicable and have been suggested. In one setting. Bloom filters are used to quickly determine whether 
an element is in a local cache |FCABOO] . instead of querying the cache which may be slow. Since the cache 
has limited size, it usually stores the least recently used items (LRU policy). A sliding Bloom filter is used to 
represent the last n elements used and thus, maintain a representation of the cache's contents at any point 
in time. 

Another setting consists of the task of identifying duplicates in streams. In many cases, we consider the 
stream to be unbounded, which makes it impractical to store the entire data set and answer queries precisely 
and quickly. Instead, it may suffice to find duplicates over a sliding window while allowing some errors. In 
this case, a sliding Bloom filter (with m set to infinity) suffices and in fact, we completely characterize the 
space complexity needed for this problem. 

1.1 Problem Definition 

Given a stream of elements a = Xi,X2, ■■■ from a finite universe U of size u, parameters n, m and e, we want 
to approximately represent a sliding window of the n most recent elements of the stream. An algorithm 
A is given the elements of the stream one by one, and does not have access to previous elements that 
were not stored explicitly. Let at — xi,...,Xt be the first t elements of the stream a and let crt(fc) — 
a;niax(o,t-fc+i)j ■ ■ ■ jXt be the last k elements of the stream at- At any step t the current window is at{n) 
and the to elements before them are (Tf_„(?7i). li m — oo then define at_n(m) — xi, . . . ,xt-n- Denote 
A{<Tt,x) e {'Yes', 'No'} the result of the algorithm on input x given the stream at- We call A a {n,m,e)- 
sliding Bloom filter if for any t > 1 the following two conditions hold: 

1. For any x e (Tt{n): Pt[A{x) = 'Yes'] = 1 

2. For any x ^ at{n + m) : Vy[A{x) = 'Yes'] < e 

where the probability is taken over the internal randomness of the algorithm A. Notice that for an element 
X G crt_„(?Ti) the algorithm may answer arbitrarily (no restrictions). See Figure 1. 
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Figure 1: The sliding window of the last n and n ~\- m elements 

An algorithm A for solving the problem is measured by its memory consumption, the time it takes to 
process each element and answer a query. We denote by \A\ the maximum number of bits used by A at 



any step. The model we consider is the unit cost RAM model in which the elements are taken from a 
universe of size u, and each element can be stored in a single word of length w = \ogu bits. Any operation 
in the standard instruction set can be executed in constant time on w-bit operands. This includes addition, 
subtraction, bitwise Boolean operations, left and right bit shifts by an arbitrary number of positions, and 
multiplication. The unit cost RAM model is considered the standard model for the analysis of the efficiency 
of data structures. 

1.2 Our Contributions 

We provide tight upper and lower bounds to the (n, m, e)-problem. Our first contribution is a construction 
of an efficient sliding Bloom filter: it has query time 0(1) worst case and update time 0(1) worst case with 
high probability. For e ~ o(l) the space consumption is near optimal. 

Theorem 1.1. For any m > 0, e ~ o(l) and sufficiently large n there exist an {n,m,e) -sliding Bloom filter 
having the following space and time complexity on a unit cost RAM: 

1. Query time is 0{\) worst case. For any polynomial p{n) and sequence of at most p(n) operation with 
probability at least 1 — l/p{n) over the internal randomness of the data structure all insertions are 
performed in time 0(1) worst case. 

2. Space consumption {1 + o{l)) (n\og- + max {n\og\og-,n\og{^)}) 

3. If m — CO then the space is (1 + o(l)) (nlog - + n log log i) 

Our second contribution is a matching space lower bound. We prove that if e = o(l) then any sliding 
Bloom filter must use space that is within an additive low order term of the space of our construction, 
regardless of its running time. We assume that the sliding Bloom filter has the desired property that at any 
point the number of false positives is not too large (the 'absolute false positive assumption' pj) . In appendix 
A we state and prove a slightly weaker result without this assumption. 

Theorem 1.2. For any m > 0, e = o(l), sufficiently large n and an {n,m,s) -sliding Bloom filter A if we 
assume that for any stream a it holds that Pr[3i < 3n : \{x £ U : A{ai,x) = 'Yes'}\ > 3eu] < ^ then 

1. \A\ > nlogi +max{nloglogi,nlog(^)} + 0(n) 

2. If m — oo then \A\ > nlog - + n log log - + 0{n) 
From Theorems 1.1 and 



1.2 



we conclude that making m larger than n/ log - does not make sense: one 
gets the same result for any value in [n/log -,oo). The lower boimd is proved by an encoding argument 
which is a common way of showing lower bounds in this area (see for example |PSW13J ). Specifically, the 
idea of the proof is to use A to encode a set S and a permutation tt on the set corresponding to the order of 
the elements in the set. We consider the number of steps from the point an element is inserted to A to the 
first point where A answers 'No' on it, and we define A to be the sum of n such lengths. If A is large, then 
there is a point where A represents a large portion of S', which benefits in the encoding of S. If A is small, 
then A can be used as an approximation of tt, thus encoding tt precisely requires a small amount of bits. 
In either case, the encoding must be larger than the entropy lower bound which yields a bound on the size 
of A. The optimal value of the trade-off between representing a larger set or representing a more accurate 
ordering is achieved by our construction. In this sense, our upper bound and lower bound match not only 
by 'value' but also by 'structure'. 



^ The absolute false positive assumption is a very desirable property from a Sliding Bloom Filter and reasonable constructions 
enjoy it. We use it in the proof in sectionls] and in appendix A we show how to get arbitrary close to the lower bound without 
it. An example of a Sliding Bloom Filter for which the assumption does not hold can be obtained by taking any (n,e)-Sliding 
Bloom Filter and modify it such that it chooses a random index k £ [l,»i] and at step k of the stream it always answers 'Yes'. 
This results in a (n, e + — )-Sliding Bloom Filter in which there will always be some step at which the false positive rate is high 
(it is 1). One could add the assumption as a requirement to the definition of a Sliding Bloom Filter and it is not clear if a 
Sliding Bloom Filter can benefit from the absence of such a requirement. 



1.3 Related Work and Background 

The data structure for the approximate set membership (Bloom filter) as suggested by Bloom in 1970 
[Blo70| is relatively simple: it consists of a bit array which is initiated to '0' and k random hash functions. 
Each element is mapped to k locations in the bit array using the hash functions. To insert an element 
set all k locations to 1. On lookup return 'Yes' if all k locations are 1. To achieve an error probability 
of e for a set of size n Bloom showed that if fc = log - then the length of the bit array should be « 
1.44nlog-. Since its introduction the Bloom filter has been investigated extensively and many variants, 
implementations and applications of it have been suggested. A comprehensive survey (for its time) is Broder 
and Mitzenmacher |BM02| . 

A lot of attention was devoted for determining the exact space and time requirements of the approximate 
set membership problem. Carter et al. |CFG"'"78| proved an entropy lower bound of n log - , when the universe 
U is large. They also provided a reduction from approximate membership to exact membership, which we 
use in our construction. The retrieval problem associates additional data with each element of the set. In 
the static setting, where the elements are fixed and given in advance, Dietzfelbinger and Pagh propose a 
reduction from the retrieval problem to approximate membership |DP08| . Their construction gets arbitrarily 
close to the entropy lower bound. In the dynamic case, Lovett and Porat |LP10| proved that the entropy 
lower bound cannot be achieved for any constant error rate. They show a lower bound of C{e) ■ n log - where 
C{e) > 1 depends only on e. Pagh, Segev and Wieder [PSW13J showed that if the size n is not known in 
advance then at least (1 — o(l))nlog - + il{nlog\ogn) bits of space must be used. 

Pagh, Pagh and Rao |PPR05) used the reduction of Carter et al. to improve the original Bloom filter in 
several ways: Lookup time becomes 0(1) independent of e, has succinct space consumption, uses explicit 
hash functions and supports deletion. In the dynamic, setting for a constant e we do not know what is 
the leading term in the memory needed, however, for any sub-constant e we know that the leading term is 
n log - : Arbitman, Naor and Segev present a solution which is optimal up to an additive lower order term 
(i.e., it is a succinct representation) [ANSlOj . Thus, in this work we focus on sub-constant e. 

The model of sliding windows was first introduced by Datar et al. |DGIM02] . They consider maintaining 
an approximation of a statistic over a sliding window. They provide an efficient algorithm along with a 
matching lower bound. 

Data structures similar to the sliding Bloom filters have been studied in the literature. The simple 
solution using m = n consists of two large Bloom filters which are used alternatively. This method known as 
double buffering was proposed for classifying packets caches |CFL04j . Yoon [YoolO| improved this method by 
using the two buffers simultaneously to increase the capacity of the data structure. Deng and Rafiei |DR06] 
introduced the Stable Bloom filter and used it to approximately detect duplicates in stream. Instead of 
a bit array they use an array of counters and to insert an element they set all associated counters to the 
maximal value. At each step, they randomly choose counters to decrease and hence older element have higher 
probability of being decreased and eventually evicted over time. Metwally et al. |MAA05J showed how to 
use Bloom filters to identify duplicates in click streams. They considered three models: Sliding Windows, 
Landmark Windows and Jumping Windows and discuss their relations. A comprehensive survey including 
many variations is given by Tarkoma et al. |TRL12j . However, as far as we can tell, no formal definition of 
a sliding Bloom filter as well as a rigorous analysis of its space and time complexity, appeared before. 

2 The Construction of a Succinct Sliding Bloom Filter 

Our algorithm applies the construction of an approximate membership data structure for a set S using the 
reduction from approximate membership to exact membership [CFG^78| . On an input x, we store h{x) for 
some hash function h, in a dynamic dictionary and in addition some information on the last time where x 
appeared. We consider the stream to be divided to generations of size n/c each, where c is a parameter 
that will be optimized later. The first n/c elements are generation 1, the next n/c elements are generation 
2 etc. The current window contains the last n elements and consists of at most c + I different generations. 
Therefore, at each step, we maintain a set 5' that represents the last c+ 1 generations (that is, at most n + n/c 
elements) and count the generations mod c+1. In addition to storing h{x), we associate s = log (c -I- 1) bits 
indicating the generation of x. Every n/c steps, we delete elements associated with the oldest generation. 
Finally, we adjust c to optimize the space consumption while requiring n/c < m. 



In the rest of this section, we describe the algorithm in more detail. We first present the reduction from 
approximate to exact membership. We define a dynamic dictionary and the properties we need from it in 
order to implement our algorithm. Then, we describe the algorithm in two stages, using any dictionary as 
a black box. The memory consumption is merely the memory of the dictionary and therefore we use one 
with succinct representation. At first, the running time will not be optimal and depend on c (which is not 
a constant), even if we use optimal dictionaries. Then, we describe how to eliminate the dependency on c 
and well as deamortizing the algorithm, making the running time constant for each operation. This includes 
augmenting the dictionary, and thus it can no longer be treated as a black box. We prove correctness and 
analyze the resulting memory consumption and running time. 

2.1 A Reduction to Exact Membership 

We want to represent a set S of size n and support membership queries in the following manner: For a query 
on X € S* we answer 'Yes' and for x ^ S we answer 'Yes' with probability at most e. Choose a hash function 
h £ H from a universal family of hash functions mapping U — > [n/e] . Then for any S of size at most n it 
holds that for any x £ U: 



Pi[h{x) G h{S)] <Y'PT[h{x) = h{y)] 

h ^-^ h 



e 
< n— — e 
.. n 

yes 



where the first inequality comes from a union bound and the second from the definition of a universal 
hash family. This implies that storing h{S) suffices for solving the approximate membership problem. To 
store h{S) we use an exact dictionary V, which supports insert (including associated data), delete and 
update procedures (the update procedure can be simulated by a delete followed by an insert). While most 
dictionaries support these basic procedures, we require V to additionally support the ability of scanning. We 
further discuss these properties. 

2.2 Succinct Dynamic Dictionary 

The information-theoretic lower bound on the minimum number of bits needed to represent a set S of 
size n out of M different elements is B = B{M,n) = log(^) = nlogM — nlogn + Oin). A succinct 

representation is one that uses (1 + o{l))B bits |Dem07j . A significant amount of work was devoted for 
constructing dynamic dictionaries over the years and most of them are appropriate for our construction. 
Some have good theoretical results and some emphasize the actual implementation. In order for the reduction 
to compete with the Bloom filter construction (in terms of memory consumption) we must use a dynamic 
dictionary with succinct representation. There are several different definitions in the literature for a dynamic 
dictionary. A static dictionary is a data structure storing a finite subset of a universe C/, supporting only 
the member operation. In this paper, we refer to a dynamic dictionary where only an upper bound n on the 
size of S is given in advance and it supports the procedures member, insert and delete. The memory of 
the dictionary is measured with respect to the bound n. 

In addition to storing h{S), we assume V supports associating data with each element. Specifically, we 
want to store s-bits of data with each element, where s is fixed and known in advance. Finally, we assume 
the dictionary supports scanning, that is, the ability to go over the associated data of all elements of the 
dictionary, and delete the element if needed. Using the scanning process, we scan the generations stored in 
the dictionary and delete elements of specific generations. 

Several dynamic dictionaries can be used in our construction of a Sliding Bloom Filter. The running 
time and space consumption are directly inherited from the dictionary, making it an important choice. We 
use the construction of |ANS10| (but other alternative are possible). It supports insert and delete in 
0(1) worst case with high probability while having a succinct representation. Implicitly in their work, they 
support associating any fixed number of bits and scanning. When s-bits of data is associated with each 
X € S, the representation lower bound becomes B + ns bits. For concreteness, the memory consumption of 
their dictionary is (1 + o(l)) {B + ns), where the o(l) hides the expression °^i/f " . 



2.3 Algorithm with Dependency on e 

Initiate a dynamic dictionary V of size n' = n (l + ^) as described above. Let H — {h : U ^ [n' /e]} be a 
family of universal hash functions and pick /i G "H at random. At each step maintain a counter i indicating 
the current generation and a counter i indicating the current element in the generation. At every step i is 
increased and every n/c steps i is reset back to and i is increased mod c + 1. 

To insert an element x check if h{x) exists in V. If not then insert {h(x),£) (insert h{x) associated with 
£) into v. If h{x) is in V, then update the associated data of h{x) to £. Finally, update the counters i and 
£. If £ has increased (which happens every n/c steps) then scan D and delete all elements with associated 
data equal to the new value of £. 

To query the data structure on an element x, return whether h{x) is in T). See appendix B for pseudo-code 
of the insert and lookup procedures (See Algorithm 1). 

Insert ix) : 
1: if h{x) is a member of V then 



update h{x) to have data £ 
else 

insert {h{x),£) into V 
end if 

maintain counters i and £ 
if the value of £ has changed then 

scan V and delete elements of generation £ 
end if 



Lookup (a;) : 

1; procedure member(x) 
2; if h{x) is a member of V then 
3; return 'Yes' 

4; else 

5; return 'No' 

6; end if 

7: end procedure 



Algorithm 1: Pseudo-code of the Insert and Lookup procedures 

Correctness: We first notice that V is used correctly and never represents a set of size larger than n' . In 
each step we either insert an element to generation £ or move an existing element to generation £. In any 
case, each generation consists of at most n/c elements in V. Each n/c we evict a whole generation, assuring 
no more than c + I generations are present in the dictionary at once. Thus, at most n' are represented at 
any given step. 

Next we prove that for any time t the two conditions hold. The first condition follows directly from the 
algorithm. Assume h{x) is inserted with associated generation £ = j. Notice that its associated generation 
can only increase. h{x) will be deleted only when £ completes a full cycle and its value is j again, which 
takes at least n steps. Thus, for any x G o't(ri), h(x) is in V and the algorithm will always answer 'Yes'. 

For the second condition assume that x ^ at{n + m) and notice that n -I- m is exactly c + 1 generations. 
Assume w.l.o.g. that S — {yi, . . . ,yn'} {S could have less than n' elements) is the set of elements represented 
in V at time t. Then PT[h{x) = j/i] = ^ for all i G [n']. Therefore, using a union bound we get that the 
total false positive probability is 

n' 

Pt[A{x) = 'Yes'] = PT[h{x) G h{S)] < ^Pr[/i(x) ^ yi] < e 

i=l 

Memory consumption: The bulk of memory is used for storing T). In addition, we need to store two 
counters i and £ and the hash function ft,, which together take O(logn) bits. D stores n' elements out of 



M = [n'/e] while associating each with s = logc bits. Using the dictionary of |ANS10| yields a total space 
of 

{l + o{l)){B{j,n')+n's) = (1 + o(l)) • n A + ^ Vlog ^ + logc + l") 

We minimize this expression, as a function of c, and get that the minimum is at the solution to c — log c = 
log i — 1. An approximate solution is c = log - and requiring that n/c < m yields that c — max {log i, m/n\ 
and the total space is 

(1 + o(l)) n\og — h n. • max < log log -, log(n/TO) 



e 
if ?7i — oo (or m > log - ) then c = log - and we get 



(1 + o(l)) ( nlog - + n log log - 

e £ 



'°g'°S" +>iovof^vo if c — 9 '^^f ° 



as required. As mentioned, the o(l) hides the term °^Jf " , therefore if e = 2 iogi/3„ ^j-^g gp^ce con- 
sumption can be written as nlog- + n log log ^ + 0{n) which matches the lower bound up to the 0{n) 
term. 

Running time: Assume that V supports 0(1) running time worst case for all procedures. The lookup 
procedure performs a single query to V and hence always runs in 0(1). In the insert procedure, every n/c 
steps, the value of i is updated and we scan all elements in V deleting old elements. For any other step, 
the running time is 0{1). Therefore, the total running time for n/c steps is 0{n'), which is 0{c) amortized 
running time. If m > log - then c = log - and the running time is (log -), otherwise it is 0( — ), which in 
both cases is not constant. We now show how to eliminate the large step, making the running time 0(1) 
worst case. Using the dictionary of |ANS10| we get that the total running time including the dictionary's 
operations is 0(1) worst case with high probability (over internal randomness of the dictionary). 

2.4 Eliminating the Large Step and Reducing the Time 

We now modify the algorithm to eliminate the large step and reduce the time such that we get rid of the 
dependency on e. These modifications require some additional properties from the dictionary. Later, for 
concreteness, we describe how to modify the dictionary of |ANS10| to support these properties. 

We need the scanning process to support running in multiple steps while allowing other operation running 
in parallel. The scanning procedure should be able to save its state, then allow other operations to run and 
finally resume its state and continue the scanning process. In case elements have been added, moved or 
deleted the scanning process should continue scanning all elements nevertheless. 

In general, modifying a dictionary to support this is hard as the dictionary might implicitly represents 
many elements using the same memory space. However, it can be implemented easily assuming each element 
has a unique memory space in which it is (implicitly) represented, called a 'cell'. An insert or delete 
procedure may modify a constant number of cells. Elements of cells which were accessed are called the 
accessed elements. We assume the cells have some order in which we can scan them and save an index 
indicating the state of the scanning process using o{n) bits of memory (actually it is O(logn)). We assume 
that given a cell, we can figure out the associated data with the element of the cell and delete the element 
of this cell from the dictionary. 

Using these assumptions, we can eliminate the large step by scanning and deleting old generations over 
a process of many steps. An element is considered old if it is more than c + 1 generations older than the 
current generation counter, £. When scanning an element we examine its generation and delete the element 
if it's old. Instead of scanning all the n' elements in one step, we scan two elements at each step and save 
the scanning index. Thus, after n'/2 steps we scan all n' elements of the dictionary. A problem that occurs, 
is that the dictionary can change during the process, which may result in the scanning missing cells. To 
solve this, we modify the dictionary's insert and delete procedures to check whether any accessed element 
needs to be deleted. For example, an insert procedure may move an element from one cell to another, which 
was already scanned. Thus, before moving or changing a cell we scan it. This way, each element is scanned 
either by the scanning process or by an insert or delete procedure. 

7 



This implies that an old element might be deleted only after n'/2 steps, which means that there could 
be up to 2c + 1 different generations stored in T) at the same time. We need to extend the range of the 
counter £ to loop between and 2c + 1 (instead of between and c+ 1), in order to represent all generations. 
Only the c + 1 recent generations are considered active and the rest slated to be deleted. We change the 
Lookup procedure to return 'Yes' on input x only if h{x) exists in 2? and its associated generation is active. 
These modifications have negligible effect on the memory consumption and running time and preserve the 
correctness of the algorithm. 

We discuss implementing these requirements in the construction of |ANS10) (see pseudo-code on page 9 
of their paper). Their hashing scheme is based on two- level hashing, the first level consists of an array Tg 
of bins of size d and the second level consists of cuckoo hashing which includes two arrays, Ti and T2 and a 
queue, Q. The cells are the d cells in each bin of Tq, the cells of Ti, T2 and Q. Each element is implicitly 
stored in a unique cell in one of the components. Scanning the cells is achieved by going over the cells of 
each component and saving an index of the current component and cell within the component. The delete 
procedure is simple and does not involve moving cells. The insert procedure is more involved and may move 
cells from one component to another, e.g. a cell from Q might be moved to Tq. Since the running time is 
constant, so is the number of accessed elements. The procedure can be easily modified such that before a 
cell is accessed it is first scanned, and deleted if old. After these modification, the dictionary of |ANS10) . 
supports all needed requirements and hence this completes our construction of a (n, £)-Sliding Bloom Filer. 

3 A Tight Space Lower Bound 

In this section we present a matching space lower bound to our construction. We restate and prove Theorem 



Theorem 1.2 (Restated). For any m > 0, e — o(l), sufficiently large n and an {n,m,e) -sliding Bloom 
filter A if we assume that for any stream a it holds that Pr[3i < 3n : |{a; G U : A{<Ji, x) — ^Yes'}\ > 3eu] < ^ 
then 

1. \A\ > nlogi +max{nloglogi,nlog(^)} +0{n) 

2. If m ~ 00 then \A\ > nlog - + n log log - + 0{n) 

Proof. Let A be an algorithm breaking the space requirement in the statement of the theorem. The main 
idea of the proof is to use A to encode and decode a set S C U and a permutation vr on the set (i.e. an 
ordered set). Giving S* to yl as a stream, ordered by tt, encodes an approximation of S and tt. The decoder 
uses the encoding of A to compute all the elements of U for which A answers 'Yes'. Then we encode only 
the elements of S from within this set. To decode tt, the decoder checks how many elements are needed to 
be added to the stream in order for A to "release" each of the elements in S (that is, to answer 'No' on it). 
Then we encode only the difference between the location i and the location it has been released. 

Denote by Ar the algorithm with fixed random string r and let fiAri'^) — {^ • Ar{(7,x) — 'Yes'}. We 
show that w.l.o.g. we can consider A to be deterministic. Since Prr[3i < 3n : |/i^^(CTi)| > eu] < ^ we fix r* 
to be such that for all 1 < i < 3n it holds that |/iyi^((Ti)| < eu. 

Notice that r* need not be explicitly specified in the encoding since the decoder can compute it using 
only the algorithm of A. From now on, we assume that A is deterministic (and remove the A^ notation) and 
assume that for any a of length 2n we have that fJ-A{<^) < 3eu. We now make an important definition: 

£{a,x) = min{argmin{3yi,...,2/fe e U : A{ayi ■■ ■yk,x) = 0},n,m} 

k 

£{a,x) is the minimum number of elements needed to be added to a such that A answers 'No' on x. Notice 
that £a{-) can be computed for any set 5" given the representation of A{a). 

We encode any set S of size 2n and a permutation tt : [2n] -^ [2n\ using A. After encoding S we compare 
the encoding length to the entropy lower bound of B{u,2n) + log((2n)!). Consider applying tt on (some 
canonical order of) the elements of S and let xi, . . . , a;2n be the resulting elements of S ordered by n. For 
any i > 2n let Xi = Xi^2m then for any k > 1 define the sequence ak = xi, . . . ,Xk. Let (^{(Jk) = fJ'i'^k) H S 
and define 

A(crfc, z) = £{ak, x-i) + (k-n) -i 



k 

N A(CTfe,i), and A = max Afc 

^- — ^ 7i<k<3n 

2— /e— n+1 



Notice that, given ^4(0"^;), A(crfe, i), A; and n one can compute the position i of the element Xi. Define 

i—k—n+l 

Notice that if m > n (or m = cx)) then < A < n^, otherwise < A < nm 
Lemma 3.1. There exist a k' , n < k' < An such that |0((7fe')| > n + - . 
Proof. Let k* be such that A = A/j* and consider af^-^. By an averaging argument, it is enough to show that 

E \4>{<J,)\>n' + X. 
j=k' 

1 < i < k* we know that xi £ ak*{n) and by the definition of (.{ak*,Xi) we know that 



For any k* 



Xi e 
Xi e 



'k)^ ■ 



)). For any k* + 1 < i < k* 



1 we know that Xi G (T/c*+„_i(n) and therefore 



(ui), . . . , 4>{ak*+n-i)- Instead of summing over 4'{o'j) we sum over Xi and count the number of <J){(Tj) 



such that Xi G 4>{<Jj)'. 



k* +n 

j=k'+i 



> 







/e* 



A;* A;*-f-n — 1 

fc* 

= E {i{(Tk-,Xi) + k* -n-i)+ ^ (i + n-fc* + l) + 

i— fc*— n+1 i— fc*— n+1 

fe* 

E ^K-'«) + 

i — k* — n+1 

> Afc. + n^ = n^ + A 



E 

fc* — r 

n(n + 3) n{n — 1) 



n{n — 1) 



n 



Fix k' from Lemma 3.1 Since A is the maximum of ah the At-'s, we know also that Xi,' < A. We include 



the memory representation of A{ak') in the encoding. The decoder uses this to compute the set iJ,{ak'), 
which by the absolute false positive definition we know that |/z(crfe')| < 3eu. By Lemma 3.1 we know that 
[(/•(cfe')! > n + ^, therefore we use B{3eu,n + ^) bits to encode n + ^ elements of S out of them. The 
remaining n— - elements are encoded explicitly using B{u, n — -) bits. This completes the encoding of 5'. 
To encode tt we need the decoder to be able to extract i for each Xi. For any Xi G <Jk'{n) the decoder 
uses A{ak') and computes £{(Tk', Xi). Now, in order for the decoder to exactly decode i we need to encode all 

k' 

the A{ak',i )'s. Since ^ A{ak',i ) < A we can encode all the A{ak' ,i )'s using log ("^ ) bits (balls and 

i—k'—n~\-l 

urns method), and the remaining elements' positions will be explicitly encoded using nlogn bits. Denote by 
|A| the number of bits used by the algorithm A. Comparing the encoding length to the entropy lower bound 
we get 



\A\ + log 
and therefore 

\A\ > {n 



3eu 

n+ ^ 



A 



log 



n-^ 



loe 



X + n 
n 



n log n > log 



2n 



log((2n)!) 



) log - + (n H ) log n + (rt ) log (n ) - "- log (A + n) + 0{n) 

n e n n n 



Consider two possible cases for A. If A < 0.9n^ then we get 



\A\ > {n-\ — )log — I- 2nlogn — nlog(A + n) + 0{n) 



The minimum of this expression, as a function of A, is achieved at A = . " i — n. If to > . "i — 1 then the 
minimum can be achieved and we get that 

\A\ > nlog — h n log log — h 0(n). 

£ e 

2 

Otherwise, if ?n < 7-^ — 1 then A < mnn < t^^ — n and minimum value will be achieved at A = nm which 

' log J — — log j 

yields the required lower bound: 

1 ?Ti 

Ml > nlog — h nlog h 0(n). 

e n 

If 0.9n2 < A < n^ then to > 0.9n > ^^. Thus, we get that 

— — — log J 10 

\A\ > (n H ) log (n )logn+(n ) log (n ) + 0{n) 

n e n n n 

the minimum of this expression, as a function of A between given range is achieved at A = 0.9n^ which 
yields 

1^1 > nlog — h nlog log — h 0(n) 

£ e 

as required. D 
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A Removing the Absolute False Positive Assumption 

We describe how to eliminate the absolute false positive assumption in the lower bound, which yields a 
slightly weaker results. 

Theorem A.l. For any m > Q, 5 > Q, e — o(l) and sufficiently large n if A is an {n,m,s) -sliding Bloom, 
filter A then 

1. \A\ > nlog i + (1 - ,5) max {nloglog i,nlog(^)} + 0(n) 

2. //m = oo then \A\ > nlog ^ + (1 - 6)n\oglog - + 0{n) 



Proof, (sketch) The proof of this theorem is similar to the proof of Theorem 1.2 and we describe the needed 
modification. The absolute false positive assumption assures us that with high probability for any i < 3n 
we have that |/i((7i)| < Sen, and hence we fix some r* on which it holds. Since we don't use the assumption, 
we show that there exist some r* which for most I's the condition holds. 

Claim A. 2. There exist some r* such that for any a of length 2n: \{i : |/i^^, (cri)| > Hew}! < ^ 

Proof. By the requirement of a sliding Bloom filter we get that for any i, Er[|MAr('''i)l] < 2n -f eu. Then by 
Markov's inequality we get Prr[|^A^(o'i)| > 10 (2n + eu)] < ^. Assuming u^^ nwe get that Prr[\fJ-A^{<^i)\ > 
lieu] < jq. Define / = Er[|{* < 2n : |/Xyi^((Ti)| > ll£u}|] to be the number of 'bad' indexes i.e. they have 
too many false positives. Then by linearity of expectation we get that / = 2nPrr[\nAr{<^i)\ > Hsu] < |. 
Thus, there must be a specific r* such that \{i : l/iA^. (cj)! > ll£u}| < ". D 



Fix k from lemma [33] of the proof. The problem is that the Lemma [3T] doesn't assure that k isn't in the 
set / of 'bad' indexes, where the false positive rate is too high. We need to modify the lemma to compute 
the average over all indexes that are not in /. This results in a weaker version of the lemma stating that 
I^Cc/c)! > F (f^ + ^)- Of course, the choice of $ is arbitrary and could easily be changed to (1 — S) for any 
(5 > and resulting in a larger constant in the 0{n) term. This increases the number of bits needed for the 
encoding and results in the lower bound: 

\A\ > nlog- + (1 - 6)nlog\og- + 0{n) 

e e 

a 
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